Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop
نویسندگان
چکیده
In aiming at research and development on machine translation, we produced a test collection for Japanese-English machine translation in the seventh NTCIR Workshop. This paper describes details of our test collection. From patent documents published in Japan and the United States, we extracted patent families as a parallel corpus. A patent family is a set of patent documents for the same or related invention and these documents are usually filed to more than one country in different languages. In the parallel corpus, we aligned Japanese sentences with their counterpart English sentences. Our test collection, which includes approximately 2 000 000 sentence pairs, can be used to train and test machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval and the contribution of machine translation to a patent retrieval task can also be evaluated. Our test collection will be available to the public for research purposes after the NTCIR final meeting.
منابع مشابه
Overview of the Patent Translation Task at the NTCIR-7 Workshop
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation and performed the Patent Translation Task at the Seventh NTCIR Workshop. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2 000 000 sen...
متن کاملOverview of the Patent Machine Translation Task at the NTCIR-9 Workshop
This paper gives an overview of the Patent Machine Translation Task (PatentMT) at NTCIR-9 by describing the test collection, evaluation methods, and evaluation results. We organized three patent machine translation subtasks: Chinese to English, Japanese to English, and English to Japanese. For these subtasks, we provided large-scale test collections, including training data, development data an...
متن کاملExploiting Patent Information for the Evaluation of Machine Translation
We have produced a test collection for machine translation (MT). Our test collection includes approximately 2 000 000 sentence pairs in Japanese and English, which were extracted from patent documents and can be used to train and evaluate MT systems. Our test collection also includes search topics for crosslingual information retrieval, to evaluate the contribution of MT to retrieving patent do...
متن کاملOverview of the Patent Machine Translation Task at the NTCIR-10 Workshop
This paper gives an overview of the Patent Machine Translation Task (PatentMT) at NTCIR-10 by describing its evaluation methods, test collections, and evaluation results. We organized three patent machine translation subtasks: Chinese to English, Japanese to English, and English to Japanese. For these subtasks, we provided large-scale test collections, including training data, development data ...
متن کاملThe Patent Mining Task in the Seventh NTCIR Workshop
This paper introduces the Patent Mining Task in the Seventh NTCIR Workshop, which is currently in progress, and the test collections produced in this task. Its goal is the classification of research papers written either in Japanese or in English into the International Patent Classification (IPC) system, which is a global standard patent classification system.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008